2024-07-30
What is feature selection? Why do we care?
forward, backward, and stepwise variable selection in linear models.
Univariate screening procedures
Advanced feature selection methods:
Classification of methods
CFS’s feature subset evaluation function is:
\(\text{Ms} = \frac{k\bar{\text{rcf}}}{\sqrt k + k(k - 1)\bar{\text{rff}}}\)
where Ms is the heuristic ”merit” of a feature subset S containing k features, \(\bar{\text{rcf}}\) is the mean feature-class correlation and \(\bar{\text{rff}}\) is the average feature - feature intercorrelation.
Steps:
For an unknown input vector x, a linear classifier takes the form
\(y(x)=\)sign\((w⋅x−b)\)
For LS-SVM, substituting the KKT conditions into the Lagrangian gives the objective function
\(L = -\frac{1}{2} \sum \sum \alpha_i \alpha_j (K(x_i, x_j) + \frac{\sigma_{ij}}{C}) + \sum \alpha_i y_i\)
Where \({\sigma_{ij}} = 1\) if \(i=j\) and \(0\) otherwise.
The ranking criterion can be obtained by:
\(D(L^{-m}) = -\frac{1}{2} \sum \sum \alpha_i \alpha_j (K(x_i, x_j) - (K(x_i^{-m}, x_j^{-m})\)
Where \(x_i^{-m}\) , \(x_j^{-m}\) are the vectors in which \(m\)-th feature has been removed.
The LASSO estimate is defined by the solution to the l1 optimization problem
minimize \(\frac{\| Y - X\beta \|_2^2}{n}\) subject to \(X \sum_{j=1}^{k} \|\beta_j\|_1 < t\)
where \(t\) is the upper bound for the sum of the coefficients
Step 1: filtering using the correlation coefficient. (e.g. trim the “fat”low hanging fruit”) Step 2: remove variables iteratively using RFE.
“Bag of words”
Three metrics:
Accuracy on the test set
difference between accuracy on the training and test set (overtraining)
number of variables selected (model complexity)
Baseline: Full logistics model with no feature selection CFS: Select best of 20 correlation thresholds using cross validation RFE: Select best of 20 sized subsets using cross validation LASSO: Select best penalty term using cross validation CFS + FRE: Remove 20% of variables with lowest correlation, Select best of 20 sized subsets using cross validation
| Method | Number of Features | Test Accuracy Score | Accuracy Decrease from Train |
|---|---|---|---|
| Baseline | 1653 | 0.915 | 0.070 |
| CFS | 126 | 0.936 | -0.004 |
| RFE | 1144 | 0.911 | 0.067 |
| LASSO | 50 | 0.903 | -0.006 |
| CFS + RFE | 683 | 0.920 | 0.045 |
| Method | Number of Features | Test Accuracy Score | Accuracy Decrease from Train |
|---|---|---|---|
| Baseline | 4820 | 0.846 | 0.097 |
| CFS | 1449 | 0.868 | 0.042 |
| RFE | 2980 | 0.863 | 0.059 |
| LASSO | 4472 | 0.880 | 0.045 |
| CFS + RFE | 2672 | 0.778 | 0.088 |